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FIELD OF THE INVENTION 



The present invention relates to methods for 
compressing binarized images, generally* 

BACKGROUND OF THE INVENTION 

Arithmetic coding is described in: 

Witten, I.'H.et al, "Arithmetic coding for data com- 
pression", Computing Practices, Communications of the ACM, Jun 
1987, Vol. 30(6); and 

"Arithmetic coding and statistical modeling", Dr. 
Dobb's Journal, Feb. 1991, pp. 16-29. 

The MR decoding scheme is described in CCITT Recommen- 
dation T.4 and T.6 for Groups 3 and 4. 

A conventional binarizing technique is described in 
Foley, J. et al, Computer Graphics: Principles and practice , 2nd 
Ed., Section 13.1.2, pages 568 - 573. 

The disclosures of all of the above publications are 
hereby incorporated by reference. 
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SUWIARY OF THE INVENTION 

The present invention seeks to provide an improved 
image manipulation system. 

There is thus provided in accordance with a preferred 
embodiment of the present invention a method for compressing 
binarized images including receiving a binarized image and gener- 
ating a first sequence of first code symbols representing the 
binarized image wherein at least one row of the image is repre- 
sented in run-length encoded format, and encoding a portion ^of 
the first sequence of code symbols using a preliminary encoding 
scheme, thereby to provide a first portion of a second sequence 
of code symbols, and, while encoding, accumulating the frequency 
of at least some of the first code symbols thus far encoded" and 
generating an additional portion of the second sequence using a 
modified version of the code scheme such that at least' one 
subsequent code symbol in the first sequence with a large accumu- 
lated frequency is encoded more compactly in the second portion 
than at least one subsequent Gpde symbol in the first sequence 
with a small accumulated frequency. 

Further in accordance with a preferred embodiment of 
the present invention, a modified Huffman coding scheme is em- 
ployed to generate the first sequence of first code symbols. 

In accordance with another preferred embodiment of the 
present invention, there is provided a method for compressing 
binarized images including receiving a binarized image and gener- 
ating a first sequence of first code symbols representing the 
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binarized image including a representation of one row of the 
binarized image and. a representation of differences between at 
least cne subsequent row and at least one previous row, and 
encoding a portion of the first sequence of code symbols using a 
preliminary encoding scheme, thereby to provide a first portion 
of a second sequence of code symbols, and, while encoding, 
accumulating the frequency of at least some of the first code 
symbols thus far encoded and generating an additional portion of 
the second sequence using a modified version of the code scheme 
such that at least one subsequent code symbol in the first se- 
quence with a large accumulated frequency is encoded more com- 
pactly in the second portion than at least one subsequent code 
symbol in the first sequence with a small accumulated frequency. 

Further in accordance with a preferred embodiment of 
the present invention, the encoding scheme used to encode the 
first sequence of code symbols is continually modified such that 
code symbols in the first sequence with a large accumulated 
frequency are encoded more compactly in the second portion than 
subsequent code symbols in the first sequence with a small accu- 
mulated frequency. 

Still further in accordance with a preferred embodiment 
of the present invention, a modif ied-read coding scheme is em- 
ployed to generate the first sequence of first code symbols. 

Further in accordance with a preferred embodiment of 
the present invention, a modified modif ied-read coding scheme is 
employed to generate the first sequence of first code symbols. 

Still further in accordance with a preferred embodiment 
of the present invention, the method also includes binarizing a 
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discrete level image, thez*eby to provide the binarized image. 

Additionally in accordance with a preferred embodiment 
of the present invention, the method also includes binarizing a 
continuous level image, thereby to provide the binarized image. 

Still further in accordance with a preferred embodiment 
of the present invention, arithmetic coding is employed to trans- 
late the accumulated frequency of at least some of the first code 
symbols into second code symbols. 

There is alfeo provided, in accordance with a preferred 
embodiment of the present invention, apparatus for compressing 
binarized images including a run-length encoder operative to 
receive a binarized image and to generate a first sequence of 
first code symbols representing the binarized image wherein at 
least one row of the image is represented in run-length encoded 
format, and an adaptive encoder operative to encode a portion of 
the first sequence of code symbols using a preliminary encoding 
scheme, thereby to provide a first portion of a second sequence 
of code symbols, and, while encoding, to accumulate the frequen- 
cy of at least some of the first code symbols thus far encoded 
and to generate an additional portion of the second sequence 
using a modified version of the code scheme such that at least 
one subsequent code symbol in the first sequence with a large 
accumulated frequency is encoded more compactly in the second 
portion than at least one subsequent code symbol in the first 
sequence with a small accumulated frequency. 

There is further provided, in accordance with a pre- 
ferred embodiment of the present invention, apparatus for com- 
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pressing binarized images including a binarized image compressor 
operative to receive a binarized image., and to generate a first 
sequence of first code symbols representing the binarized image, 
the first sequence including a representation of one row of the 
binarized image and a representation of differences between at 
least one subsequent row and at least one previous row, and an 
adaptive encoder operative to encode a portion of the first 
sequence of code symbols using a preliminary encoding scheme, 
thereby to provide a first portion of a second sequence of code 
symbols, and, while encoding, to accumulate the frequency of ^at 
least some of the first code symbols thus far encoded and to 
generate an additional portion of the second sequence using a 
modified version of the code scheme such that at least one 
subsequent code symbol in the first sequence with a large accumu- 
lated frequency is encoded more compactly in the second portion 
than at least one subsequent code symbol in the first sequence 
with a small accumulated frequency. 

Further • in accordance with a preferred embodiment of 
the present invention, the binarized image compressor employs a 
modif ied-read coding scheme to generate the first sequence of 
first code symbols. 

Still further in accordance with a preferred embodiment 
of the present invention, the binarized image compressor employs 
a modified modif ied-read coding scheme to generate the first 
sequence of first code symbols. 

Additionally in accordance with a preferred embodiment 
of the present invention, the adaptive encoder employs arithmetic 
coding to translate the accumulated frequency of at least some of 
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the first code symbols into second code symbols. 

Still further in accordance with a preferred embodiment 
of the present invention, the encoding scheme used to encode the 
first sequence of code symbols is continually modified such that 
code symbols in the first sequence with a large accumulated 
frequency are encoded more compactly in the second portion than 
subsequent code symbols in the first sequence with a small accu- 
mulated frequency. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention will be understood and 
appreciated from the following detailed description, taken in 
conjunction with the drawings in which: 

Fig. 1 is a simplified block diagram of an image manip- 
ulation system constructed and operative in accordance with a 
preferred embodiment of the present invention, and 

Fig. 2 is a simplified flowchart illustrating a pre- 
ferred mode of operation in which the MR code element frequency 
accumulation unit of Fig. 1 processes a single MR code element in 
a sequence . 

Attached herewith are the following appendices which 
aid in the understanding and appreciation of one preferred 
embodiment of the invention shown and described herein: 

Appendix A is a computer listing of a preferred soft- 
ware embodiment of the MR coding, arithmetic coding and MR code 
element frequency accumulation units of Fig. 1, and 

Appendix B is a computer listing of a preferred soft- 
ware embodiment of the arithmetic decoding, MR code frequency 
accumulation and MR decoding units of Fig. 1. 
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DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS 

Reference, is now made to Fig. 1 which is a simplified 
block diagram of an image manipulation system constructed and 
operative in accordance with a preferred embodiment of the 
present invention . 

As shown, a digital representation of an image is 
provided from any suitable source, such as a scanner 10 which 
scans a substrate such as a continuous level photograph 20, a 
digital camera 30, a fax machine 40, an image creation worksta- 
tion 50 such as a Macintosh equipped with the Adobe Photoshop 
software package, or a storage medium such as a hard disk 60. "fhe 
digital representation of the image may be either a continuous 
level image or a discrete level image such as a document or other 
black and white image. 

If the digital representation of the image is not 
binary, the digital representation is binarized, as indicated in 
Fig. 1 by image binarization block 70, using any conventional 
binarizing technique such as those described in Foley, J. et al, 
Computer Graphics: Principles and practice . 2nd Ed. , Section 
13.1.2, pages 568 - 573. 

The binarized image is then coded by MR coding unit 80, 
using the MR coding scheme described in CCITT Recommendation T.4 
and T.6 for Groups 3 or 4 . 

The MR coded binarized image generated by MR coding 
unit 80 then undergoes arithmetic coding in arithmetic coding 
unit 90. The arithmetic coding unit 90 receives as input: 
a - the sequence of MR code elements which forms the MR 

coded binarized image and 
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b. the estimated probability of each MR code element, 

which is provided by an MR code element frequency accumulation 
unit 100. Initially, the estimated probabilities of all MR code 
elements are typically taken to be equal. However, as the MR code 
element sequence flows into the MR code element frequency accumu- 
lation unit 100, the estimated probabilities change based on the 
number of times each MR code element is encountered. 

The sequence of MR code elements typically includes 
code elements of 3 tyfc>es: 

a. MR control type code elements; ^ 

b. Black run length type code elements; and 

c. White run length type code elements. 

The frequency accumulation unit 100 typically receives 
as input each MR code element and, associated therewith, an 
indication of the type of that MR code element. Typically, unit 
100 computes the relative code element frequency for each code 
element within its own code element type. 

The arithmetic coding unit 90 may, if desired, be 
replaced by an entropy encoder- or an adaptive Huffman encoder. If 
this is the case, then the arithmetic decoding unit 110, de- 
scribed below, is replaced by an entropy decoder or adaptive 
Huffman decoder, respectively. 

One software embodiment of arithmetic coding unit 90 is 
described in "Arithmetic coding and statistical modeling", Dr. 
Dobb's Journal, Feb. 1991, pp. 16 - 29. The above reference also 
provides a software embodiment of arithmetic decoding unit 110. 

An alternative implementation of MR code element fre- 



quency accumulation unit 100 is described below with reference to 
Fig. 2. 

The output of the arithmetic coding unit 90 is a very 
compact representation of the original image which is suitable, 
for example, for compact storage on any suitable optical or 
magnetic medium and/or for rapid facsimile transmission on con- 
ventional equipment which preferably has a error correction 
capability, such as the V32bis modem. 

The compact representation of the original image is 
decompressed after being transmitted or after being retrieved 
from archival. To decompress the compact representation, €he 
compressed data stream is fed to an arithmetic decoding unit 110 
which replaces each arithmetically coded element with a 
corresponding MR code element according to the frequency of the 
arithmetically coded element. The frequency information is pro- 
vided by an MR code element frequency accumulation unit 12 0 which 
is typically identical to unit 100* Initially, the estimated 
probabilities of all MR code elements are typically taken to be 
equal. However, as the MR code element sequence flows into the MR 
code element frequency accumulation unit 12 0, the estimated 
probabilities change based on the number of times each MR code 
element is encountered. 

The output of the arithmetic decoding unit 110 is a 
sequence of MR code elements which is decoded by an MR decoding 
unit 130 using the MR decoding scheme described in CCITT Recom- 
mendation T.4 and T.6 for Groups 3 or 4 . 

The output of MR decoding unit 13 0 is a decompressed 
binarized image which is substantially identical to the binarized 
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image generated by image binarization unit 70. 

Fig. 2 is a simplified flowchart illustrating a pre- 
ferred mode of operation in which either of the MR code element 
frequency accumulation units 100 or 12 0 of Fig. 1 processes a 
single MR code element in a sequence of MR code elements. 

If (process 210) there is a decision to reset, i.e. to 
begin accumulating frequencies from zero, then the method 
advances to stage 220 . Otherwise, the method advances to stage 
24 0. A reset is performed, for example, if a new image is to be 
processed whose characteristics are thought to differ 
significantly from the previous- image processed. 

In process 220, a table is allocated for each of the 
three MR code element types. The number of cells in each table 
typically exceeds the number of code elements of each type, by 1. 
The difference between the content of the i'th cell in the table 
and the (i+l)th cell in the table, also termed herein "the i'th 
interval", is indicative of the relative frequency of the i'th 
code element, within its code element type. 

Since there are 92 code elements of the White Run 
Length type and of the Black Run Length type, the tables for 
these two types each typically have 93 cells. Since there are 9 
code elements of the MR Control type, the table for the MR Con- 
trol type typically has 10 cells. 

PROCESS 230: The table contents are initialized by 
generating equal intervals such as, typically, intervals having a 
length of 1. 

PROCESS 240: Input is received: A single MR code 
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element from the MR code element, sequence representing the image, 
and, associated therewith, its MR code element type, is received 
as input . 

PROCESS 250: Unit 100 allows arithmetic coder 90 to 
arithmetically code the current MR code element, by supplying the 
frequency intervals stored in the table corresponding to the 
current MR code element to the arithmetic coder 90. For example, 
if the MR code element is of the MR_control type, the intervals 
stored in the MR_control table are employed. 

Unit 120 allows the decoder 110 to arithmetically 
decode the current MR code element, by supplying the same infor- 
mation to decoder 110. 

PROCESS 2 60: The appropriate table is updated by incre- 
menting by 1 the contents of each cell starting from the, cell 
following the cell corresponding to the current code element. 

For example, if the fourth MR_control type code element 
is encountered, the contents of the fifth to ninth cells of the 
MR-control table are incremented by 1. 

Preferably, old frequency information is given less 
weight than new frequency information. One implementation of this 
rule is: 

PROCESS 270: For each type t, each time N t code ele- 
ments of type t have been processed, divide the cell contents of 
the frequency interval table of type t, by a suitable number such 
as 2. Suitable N t values are: 256 for MR control type, 2048 for 
black and white run length types. 

Appendix A is a computer listing in C language, of a 
preferred software embodiment of the MR coding, arithmetic coding 
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and MR code element frequency accumulation units of Fig. 1. 

Appendix' 3 is a computer listing in C language,, of a 
preferred software embodiment of the arithmetic decoding, MR code 
element frequency accumulation and MR decoding units of Fig. 1. 

The programs listed in Appendices A and B may be run on 
a conventional computer such as any UNIX computer. 

It is appreciated that the MR coding described 
hereinabove may, alternatively be replaced by MMR coding or other 
similar coding schemes. 

It is appreciated that the invention shown a-nd 
described herein is suitable for compressing and decompressing 
any type of binarized image, such as binarized discrete level 
images or binarized continuous level images, also termed herein 
"halftone images" . 

In certain applications, it may be desirable to use the 
compression methods shown and described herein to compress only a 
portion of a binarized image. For example, in medical imaging 
applications, the compression methods shown and described herein 
may be employed to generally losslessly compress the foreground 
of the medical image whereas the background of the medical image 
may be compressed using lossy techniques. 

Accumulation of frequencies, as described above, is not 
limited to accumulation of conventional, nonconditional frequen- 
cies only and is intended to include accumulation of convention- 
al, nonconditional frequencies and/or accumulation of conditional 
frequencies such as the conditional frequency of appearance of a 
first code symbol in a given position, given that a second code 
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symbol or sequence of code symbols appeared in the preceding 
position or sequence of preceding positions. 

It is appreciated that the software components of the 
present invention may, if desired, be implemented in ROM (read- 
only memory) form. The software components may, generally, be 
implemented in hardware, if desired, using conventional 
techniques . 

It is appreciated that the particular embodiment de- 
scribed in the Appendices is intended only to provide an extreme- 
ly detailed disclosure of the present invention and is not in- 
tended to be limiting. 

It is appreciated that various features of the 
invention which are, for clarity, described in the contexts of 
separate embodiments may also be provided in combination in a 
single embodiment. Conversely, various features of the invention 
which are, for brevity, described in the context of a single 
embodiment may also be provided separately or in any suitable 
subcombination. 

It will be appreciated by persons skilled in the art 
that the present invention is not limited to what has been 
particularly shown and described hereinabove. Rather, the scope 
of the present invention is defined only by the claims that 
follow: 
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CLAIMS 

1. A method for compressing binarized images comprising: 
receiving a binarized imag£ and generating a first 

sequence of first code symbols representing the binarized image 
wherein at least one row of the image is represented in run- 
length encoded format; and 

encoding a portion of the first sequence of code sym- 
bols using a preliminary encoding scheme, thereby to provide a 
first portion of a second sequence of code symbols, and, while 
encoding, accumulating the frequency of at least some of the 
first code symbols thus far encoded and generating an additional 
portion of the second sequence using a modified version of the 
code scheme such that at least one subsequent code symbol in the 
first sequence with a large accumulated frequency is encoded more 
compactly in the second portion than at least one subsequent code 
symbol in the first sequence with a small accumulated frequency, 

2. A method according to claim 1 wherein a modified Huff- 
man coding scheme is employed to generate the first sequence of 
first code symbols. - 

3. A method for compressing binarized images comprising: 
receiving a binarized image and generating a first 

sequence of first code symbols representing the binarized image 
comprising a representation of one row of the binarized image and 
a representation of differences between at least one subsequent 
row and at least one previous row; and 

encoding a portion of the first sequence of code sym- 
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bols using a preliminary encoding scheme, thereby tc provide a 
first portion of a second sequence of code symbols, and, while 
encoding, accumulating the frequency of at least some of the 
first code symbols thus far encoded and generating an additional 
portion of the second sequence using a modified version of the 
code scheme such that at least one subsequent code symbol in the 
first sequence with a large accumulated frequency is encoded more 
compactly in the second portion than at least one subsequent code 
symbol in the first sequence with a small accumulated frequency. 

4. A method according to any of claims 1-3 wherein the 
encoding scheme used to encode the first sequence of code symbols 
is continually modified such that code symbols in the ^first 
sequence with a large accumulated frequency are encoded : more 
compactly in the second portion than subsequent code symbols in 
the first sequence with a small accumulated frequency* 

5. A method according to any of the preceding claims 
wherein a modif ied-read coding- scheme is employed to generate the 
first sequence of first code symbols. 

6. A method according to any of the preceding claims 1-4 
wherein a modified modif ied-read coding scheme is employed to 
generate the first sequence of first code symbols. 



7. A method according to any of the preceding claims and 

also comprising binarizing a discrete level image, thereby to 



provide the binarized image. 

8. A method according to any of the preceding claims 1-6 
and also comprising binarizing a continuous level image , thereby 
to provide the binarized image. 

9. A method according to any of the preceding claims 
wherein arithmetic coding is employed to translate the accumulat- 
ed frequency of at least some of the first code symbols into 
second code symbols. 

10. Apparatus for compressing binarized images comprising: 

a run-length encoder operative to receive a binarized 
image and to generate a first sequence of first code symbols 
representing the binarized image wherein at least one row of the 
image is represented in run-length encoded format; and 

an adaptive encoder operative to encode a portion of 
the first sequence of code symbols using a preliminary encoding 
scheme, thereby to provide a first portion of a second sequence 
of code symbols, and, while encoding, to accumulate the frequen- 
cy of at least some of the first code symbols thus far encoded 
and to generate an additional portion of the second sequence 
using a modified version of the code scheme such that at least 
one subsequent code symbol in the first sequence with a large 
accumulated frequency is encoded more compactly in the second 
portion than at least one subsequent code symbol in the first 
sequence with a small accumulated frequency. 
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11. Apparatus for compressing binarized images comprising: 

a binarized image compressor operative to receive a 
binarized image and to generate a first sequence of first code 
symbols representing the binarized image, the first sequence 
comprising a representation of one row of the binarized image and 
a representation of differences between at least one subsequent 
row and at least one previous row; and 

an adaptive encoder operative to encode a portion of 
the first sequence of code symbols using a preliminary encoding 
scheme, thereby to provide a first portion of a second sequence 
of code symbols, and, while encoding, to accumulate the frequen- 
cy of at least some of the first code symbols thus far encoded 
and to generate an additional portion of the second sequence 
using a modified version of the code scheme such that at least 
one subsequent code symbol in the first sequence with a large 
accumulated frequency is encoded more compactly in the second 
portion than at least one subsequent code symbol in the first 
sequence with a small accumulated frequency. 

12. Apparatus according to any of the preceding claims 10 - 
11 wherein the binarized image compressor employs a modif ied-read 
coding scheme to generate the first sequence of first code sym- 
bols. 

13. Apparatus according to any of the preceding claims 10 
- 11 wherein the binarized image compressor employs a modified 
modif ied-read coding scheme to generate the first sequence of 
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first code symbols. 



14. Apparatus according to any of the preceding claims 10 - 
13 wherein the adaptive encoder employs arithmetic coding to 
translate the accumulated frequency of at least some of the first 
code symbols into second code symbols. 

15. Apparatus according to any of claims 10 - 14 wherein 
the encoding scheme used to encode the first sequence of code 
symbols is continually modified such that code symbols in the 
first sequence with a large accumulated frequency are encoded 
more compactly in the second portion than subsequent code symbols 
in the first sequence with a small accumulated frequency. 

16. Apparatus according to any of the preceding claims 10 - 
15 and substantially as shown and described above. 

17. Apparatus according to any of the preceding claims 10 - 
15 and substantially as illustrated in any of the drawings. 

18. A method according to any of the preceding claims 1-9 
and substantially as shown and described above. 

16. A method according to any of the preceding claims 1-9 

and substantially as illustrated in any of the drawings. 
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ALLOCATE TABLES: 
WHITE_RL[93l 
BLACK_RL[93J 

MR_CONTROL[10] 
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FOR l=0 TO 92 
WHITE_RL[l] = BLACK_RL[l] = l 

FOR l=0 TO 9 
MR_CONTROL[l]=l 



230 



240 

_^ 



GET MR CODE ELEMENT & MR CODE ELEMENT TYPE 
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SUPPLY THE FREQUENCY INTERVALS STORED IN THE 
APPROPRIATE TABLE TO ARITHMETIC CODER 90 
OR ARITHMETIC DECODER 110 



UPDATE APPROPRIATE TABLE: 
INCREASE FREQUENCY OF CURRENT CODE ELEMENT 
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FOR EACH TYPE t, 
REFRESH_STATISTICAL_TABLES: 
IF TABLE[NUM„OF_SYMBOLS]=N t 
FOR l=0 TO NUM_OF_SYMBOLS 
TABLE[l]=TABLE[l]/2 
IF TABLE[i]_<TABLE[i— 1 ] 
TABLE[i]=TABLE[i- 1 ] + 1 
NEXT I 
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The following sources implement the suggested comDression technique 
" • sviousiy described. 

The agcmp program compresses a raw binary file (with no headers and with 
a known line length) to a compressed file on the disk. 

FILES: 

ag'cmp c -"the main loop for compression, converts the raw file to 
MR codes and passes them to the arithmetic coder. 

The following sources are common to both programs - agcmp and 
agexp (Decompression) and handle the statistical estimation 
(element frequency accumulation) and the arithmetic coding: 

amdl c - statistical estimation. Based on a source from Dr. Dobbs 
journal, February 1991, "Arithmatic Coding and Statistical 
Modeling" by Mark R. Nelson, but modified to fit compression 
of MR codes. 

acoder.c, abitio.c- implement the arithmetic coder, based on Dr. Dobbs 
Journal. 

COMPILATION: 

agcmp: cc agcmp.c amdl.c acoder.c abitio.c 
FURTHER INFORMATION about agcmp.c: 



AUTHOR: Arik Cordon 

INPUT: A rastered file (No headers!) with 1728 binary pixels per line 
OUPUT: compressed file. 
USAGE: agcmp IN FILE OUT_FILE 

Desc • This source opens a rastered binary file, converts it to codes 
according to MR standard, and passes the codes to the arithmatic 
coder. The compressed file is constructed from a header (see agcmp.n) 
and the compressed entropy coded stream. 

*********************************************************** 

#include <stdio.h> 
^include <stdlib.h> 
#include <string.h> 
#include <fcntl.h> 
#include <memory.h> 
#include <mailoc.h> 
#include <sys\types.h> 
^include <sys\stat.h> 
^include <dos.h> 
#include "acoder.h" 
^include "amodel.h" 
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#include "abitio.h" 
#include "agcrnp.h" 

static char *iast_line_in_prev_s^^ 

long agcmptehar *infile, char *outfiie); // returns size in bytes 
long add_file(char -in, int out); 

lg mr compress_strip(char bufill, int lines); 
void modified_READ(cnar *prev,char *cur, cliar *next, int length): 
void one_!ine_modified_read(char *prev, char *curr, int length); 
void put_rl(int len, int color); 
void put_code(int len, int color); 
void put_EOL0; . 
find next(int color, int pos, char *lme, int len); » 
void ~erase_single_dots(char *prev, char *curr, char *next, int len); 

maindnt argc, char *argv[l) 

' f l fprintf7stderr, "\nUsage: %s lMC_flie_name C3_output_file_name\n", argvlOl); 
exit(9); 

printf("totai_bytes = %ld\n", agcmp(argvlU argvl2D); 

} 



long agcmp(char *infile, char *outfile) // returns size in bytes 

* long total_bytes=OL; 
unsigned int i, j =0, k, file_count; 
char *bufi; 

unsigned size_in_bytes; 
ag_header ag_header; 
intfdi, fdtmp; 

if ((fdi = Open(infile, 0_RDONLY | 0_BINARY, SJREAD | SJWRITB) < 1) 

BigErrO, "cmr: Can't open"); 

/* INITIALIZE ARITHMETIC CODER */ 

initialize_modelO; 
init mr_modelO; 

initfalize_output_bitstream(outfile, &ag_header, slzeof (AC_HEADER)); 
initialize_arithmetic_encoderO; 

if ( (bUf l = mallOC(STRIP_SIZE*BYTES_PER_LINE)) = = NULL ) 
BigErrO, "ACCMP: no mem"); 

if ( (iast_line_in_prev_strip - malloc(PELS_PER_LiNE)) = - NULU 
BigErrO, "agcmpl: no mem"); 

memsetoastjinejn prev_strip, 0, PELS_PER_LINE); 
ag_header.number_of_lines_in_file = 0; 

/* Main loop for compression */ 

While «file_COUnt = read(fdi, bufi, STRIP_SIZE*BYTES_PER_LINE)) > - BYTES_PER_LINE) { 
fprintf(stderr, "COMPRESSING STRIP #%d\r",j + +); 
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ag_header.numbJ^f_iinesjn_fiie + = file_count/eYT£S^fc!_LiNE; 
mr_compress_strii^jfi, file_count/BYTES_PER_UNE); 
_heapminO; 

fprintf(stderr, "\n"); 

c ree(lastjine_in_prev_strip); 

,ree(bufi); 

close(fc!i); 

_heapminO; 

/* Finish and close arithmetic coding */ 
code_EOF0; 

flush arithmetic_encoder( ); 

total" bytes = flush_output _bitstream(&ag_header, sizeof(AG_HEADER)); 
free_amdl_bufsO; 

return(total_bYtes); 

} 

/* compress one strip (arbitrary size, defined in agcmp.h) */ 

long mr_compress_strip(char bufill, int lines) 

* char arraYl3HPELS_PER_LINE]; 
unsigned k, i, cur_line=>2, off; 

// Fill first 2 lines in array. 



for (k=0; k < min(2, lines); k+ +) 
for (i = 0; i < PELS_PER_LINE; i + +) 

arraylk + 1lli] = ((buf iIk*BYTES_PER_LINE + I/8I & (1 < < (7-(i%8)))) ! = 0); 

if (lines > 0) // There is at least 1 line to compress 
modified_READ(NULL, aarrayHltOl, &arrayl2]t0l, PELS_PER_LINE); // First array compression 

/* convert packed bits to "1 bit per byte" format */ 
while (curjine < lines) { 
memcpy(&arraylOH0], aarrayHHOl, 2 * pels_per_linD; 

for (i = 0; i < PELS_PER_LINE; i + +) { 
off =• curjine * BYTES_PER_LINE+i/8; 
if (bufiloffl = - 0) { 
memset(&(array[2Hil), 0, 8); 

i+=7; 
continue; 

\f (bufiloffl = = 255) { 
memset(&(array[2Hi]), 1, 8); 
i+=7; 
continue; 

array[2][i] - ((bufiloffl & (1 < < (7-(I%8)») ! = 0); 

} 

cur_line++; 

/* compress one line (given the previous line )*/ 
/* (we also provide the next line in case some filtering is 
desired) */ 

modified READ(&arraylOHOl, &array[1H0l, &array[2H0l, PELS_PER_LINE); 
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> 

/* do last line */ 

if (lines > 1) { 
memcpvC&arraYtOHOl, &array[lHOl. 2 * pels_pfr_i.!NE): 
modified_READ(&arrav[OHO], aarravmioi, null, pels_perj.inf=) : 

} 

returnd); 

} 

void modified_READ(char *prev,char *cur, char *next, int length) 
{ 

int J; 
long i; 

curiOl = white; // don't accept a black pixel on line beginning 

if (prev == NULL) { 
one_line_modified_read(lastJineJn_prev_strip. cur, length); 

return; ~ . 

memcpvdastjinejn prev_strip, cur, pels_per_linE); 
one_line_modified_read(prev, cur, length); 

} 

/* Here we actuallt translate the line to MR codes + Run-Lengths 

and pass the codes to the arithmetic coder */ 
void one_line_modified_read(char *prev, char *curr, int length) 

{ int aO, a1, a2, b1, b2, aO_color; 
aO = -1; aO_color = WHITE; 

// *curr = white; // don't accept a black pixel on line beginning 
do { 

ai = find_next(!aO_color, aO+1, curr, length); 
a2 = find_next(aO_color, a'1 +1, curr, length); 

if oo - - -1) 

bl = find_next(!aO_coior, aO+1, prev, length); 
else if (previaoi - = aO_color) 

bl = flnd_next(!aO_color, aO+1, prev, length); 
else { 

bl = find_next(aO_color, aO+1, prev, length); 
b1 = find_next(!aO_color, b1 +1, prev, length); 

} 

b2 = find_next(aO_color, b1 +1, prev, length); 
// code it 

if (b2 < aD { // pass mode 
//printfCPASS (aO=%d, ai =%d, a2=%d, bl =° / od ; b2=%d)\n", aO, ai, a2, bl, b2); 

C0de_1(MR_C0NTR0L, PASS); 
aO = b2; 
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} else if (abs(ai-b^fc= 3) { //vertical mode 
switch (al-b1) 

C3S6 0" 

//pri'ntfrvo (aO=%d, a1 =%d, a2=%d, bl =%d, b2-%d)\n", aO, ai. a2, bl, b2); 

COde_1(MR_CONTROL, VO); 

break; 

Ca //printfrvR1 (aO=%d, ai =%d, a2=%d, bl =%d, b2=%d)\n\ aO, a1, a2, bl, b2); 

C0de_1(MR_C0NTR0L, VR1); 

break; 
case *1 ■ 

//printfrvu (aO=%d, a1 =%d, a2=%d, b1 =%d, b2 = %d)\n", aO, ai. a2, bl, b2); 

C0de_1(MR_C0NTR0L, VL1); 
break; 

C3SS 2* 

//printf("VR2 (aO=%d, a1 =%d, a2=%d, bl =%d, b2=%d)\n". aO. a1, a2, b1, b2); 

C0de_1(MR_C0NTR0L, VR2); 

break; 
case ~2" 

//printf("VL2 (aO=%d, ai =%d, a2=%d, b1 =%d, b2=%d)\n", aO, ai. a2, bl, b2); 

C0de_1(MR_C0NTR0L, VL2); 

break; 
case 3* 

//pri'ntf("VR3 (aO=%d, a1 =%d, a2 =%d, b1 =%d. b2 =%d)\n", aO, ai. a2, b1, b2); 

COde_1(MR_CONTROL, VR3); 

break; 
case *3* 

//printf("VL3 (aO=%d, ai =%d, a2=%d, b1 =%d. b2=%d)\n", aO, ai. a2, bl, b2); 

C0de_1(MR_C0NTR0L, VL3); 

break; 

} 

30 — 3"1* 
} else { //' HORIZONTAL MODE 

if oo - = -1) 

//printf ("horizontal color - %d, LEN1 = %d, LEN2 = %d (aO=%d)\n", aO_color, ai-aO, a2 
= = = > -ai, aO); 

C0de_1(MR_C0NTR0L, HOR); 
put_rl(a1-aO, aO_color); 
put_rl(a2-a1, !aO_color); 
aO = a2; 

} 

if (aO < length) 
aO_color = curriaOl; 
} while (aO < length); 
//printf("EOL\n"); 

//put_EOL(iine); /* we don't need it because next aO is beyond line */ 

} 

/* converts a single run-length (unlimited length) to several runs 

according to MR (Croup3,4) spec */ 
void put_rl(int len, int color) 
{ 

if (len > 63 ) { 
put_code((len / 64) + 63, color); 
len-= (len/ 64) * 64; 
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} 

put_code(len, color); 

} 

/* codes one legitimate run */ 
oid put_code(int len, int color) 

codej (color, BW SYMBOLS - len - 1); 

} 

/* we do not need this if we know the line length in advance */ 
void put_EOL0 

{ //COde 1 (WHITE, EOL); 
//COde J (BLACK, EOL); 

} 

/* finds the next color interchange */ 
find_next(int color, Int pos, char *Iine, int len) 

{ 

int i; 

char *ptr; 

if (pos > len-1) 
returnden); 

if ( (ptr - memchrdine + pos, color, len-pos)) = = NULL) 

return len; 
else 

return (ptr-line); 

} 

BigErrdnt n, char *s) // too many bits In strip. 

* printf CErr %d - %s m , n, s); 
exit(9); 

} 

/* codes 1 symbol (Control or B'lack Run or White Run) */ 

codejKint mode, int c) 

{ 

SYMBOLS; 

convertJnt_to_symbol( c # &s, mode); 
encodelvmboK &s ); 
update_model(c); 

} 

/* to finish with the arithmetic coding: */ 
code EOFO 
{ ~ 

SYMBOL S; 

convertjnt_to_symboi( EOF, &s, MR_CONTROU; 
encode_symbol( &s ); 

} 
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/* Desc: Heaaer file rn^av for ascmp.c, ?.g ?xp.c */ 
/* AUTHOR: ArlkCord^^ */ 

/* This is a header that appears at the begining of the compressed file */ 
'-"pedef struct AGHEADER { 

long total_bytes; 

long number_of_linesJn_file; 

} AG_HEADER; 

/* in our implementaion we assume a standard fax document with 1728 pixels 

per line */ 
#define PELS_PER_LINE 1728 
#define bytes_per_line 216 



#define STRIP_SIZE 100 

#define WHITE 0 
#define BLACK 1 
#define MR_CONTROL 2 

#def ine mr_symbols 9 
#define bw_symbols 93 

#define VO 8 
#definePASS 2 
#defineVL1 3 
#defineVRl 4 
#defineHOR 5 
#defineVL2 6 
#defineVL3 7 
#defineVR2 1 
#defineVR3 0 



#define beepo putch(7) 
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/* 

* Listing 9- amdl.c 
* 

* author: Originally from Dr. Dobbs, Feb 1991, Substantially modified 

* by Arik Cordon. 

* 

This is the statistical estimation module for compressing 

* MR codes. There are three types ot codes: mr_control, black Run-Length 

* and white Run-Length. For each type we have a seperate statistical 

* estimator of order 0 for run-lengths and order 2 for MR_CONTROL 

* 

* This is a relatively simple model. For each symbol type, 

* the totals for all of the symbols are stored in an corresponding 

* array (e.g. w mr_storage"). This array has valid indices from -1 

* to Ni. The reason for having a -1 element is because the EOF 

* symbols is included in the table, and it has a value of -1. 

* (Ni =■ number of different symbol for each type) 
* 

* The total count for all the symbols is stored in totalsiNil, and 

* the low and high counts for symbol c are found in "arrayicl and 
*arravtc + U. 

*/ 

^include <stdio.h> 
include <stdlib.h> 
^include <malloc.h> 
^include <io.h> 
#include <errno.h> 
#include <fcntl.h> 
^include <sys\types.h> 
#include <sys\stat.h> 

include "AGCMP.hr 
^include "acoder.h" 
#include "amodel.ir 

/* 

* in order to create an array with indices -1 through num_of_SYmbols, I have 

* to do this funny declaration.' totalsMl = = storagelOL 

V 

short int **mr_storage; 
short int *wt_storage; 
short int *bl_storage; 
short int *totals; 

static int num_of_symbols, maximum_scale; 
static int prev, prevl; 

/* 

* When the model is first started up, each symbols has a count of 

* 1, which means a low value of c + 1, and a high value of c+2. 
*/ 

void initializejnodelO 

* int i, j, order_2_symbols; 

prev = prevl = 0; 
num_of_symbols = MR_SYMBOLS; 
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order_2_symbo!s ^km_of_symbols * num_cf_syrnbois;^^ 
rnr storage = (int^Pmalloc(sizeof(int *) * (order_2_symB^Pl)); 

for (i = 0; i<order_2_symbols; i+ +) 
mrjtcragelil - malloc(sizeof(int) * (num_of_symbols h2>); 

for (j = 0; j<order_2_symboIs; j+ +) { 
totals = &(mr_storage[jH1D; 
f or ( i = -1 ; i < = num jDf^symbols ; i + + ) 
totals! i ] = i + 1; 

} 

nurn_of_symbols = BW SYMBOLS; 

wt_storage = malloc((rium_of_symbols + 2) * sizeof(int)); 

totals = &(wt_storage!1l); 

for ( I = -1 ; i < = num_of_symbois ; i+ + ) 
totalsm = 1 + 1; 



bl_storage = malloc((num_of_symbols + 2) * sizeof(int)); 
totals - &(bl_storage[lD; 

for ( I - -1 ; I < » num_of_symbo!s ; i + + ) 
totalsm -1 + 1; 

} 

/* 

* updating the model means incrementing every single count from 

* the high value for the symbol on up to the total. Then, there 

* is a complication. If the cumulative total has gone up to 

* the maximum value, we need to rescale. Fortunately, the rescale 

* operation is relatively rare. 
*/ 

void update jnodeK int symbol) 
{ 

int i; 

for ( symbol + + ; symbol < = num^of^symbols; symbol + + ) 

totals! symbol ]++; 
if ( totals! num_of_symbols I = = maxlmum_scale ) 

* f or ( i = 0 ; i < = num_of_symbols ; I + + ) " * 
• { 

totals! M/= 2; 

if (totals!!] <= totals! i-1 J) 
totals! 1 1 = totals! M 1 + 1; 

} 

} 

} 

/* 

* Finding the low count, high count, and scale for a symbol 

* is really easy, because of the way the totals are stored. 

* This is the one redeeming feature of the data structure used 

* in this implementation. 
V 
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int ccnvert_int_to_svmbGl( irj^^YMBOL *s f int moae ) 

switch(mode) { 
case WHITE: 

totals = wt_storage + 1; 

num__of_syrnbois = BW_SYMBGL3; 

maximumjcale ~: 2048; 

break; 
case BLACK: 

totals - bl_storage + 1; 
num_of_symbols = BW_SYMBOLS; 
maximum_scale = 2048; 
break; 

case MR_CONTROL: 

num_of_symbols = MR SYMBOLS; 

totals = mr_storage[(pr"evl * num_of_symbols + prev)] + 1; 
prevl = prev; 
prev = c; 

maximum_scale = 256; 
break; 

} 

s- > scale = totals! numjDf_symbols ]; 
s->low_count = totals! c J; 
s->high_count » totals! c + 1 ]; 
return( 0 ); 

} 

/* 

* Getting the scale for the current context is easy. 
V 

void get_symboLscale( SYMBOL *s, int mode, int prev, int prevl) 
{ 

switch(mode) { 
case WHITE: 
totals » wt_storage + 1; 
num_of_symbols = BW_SYMBOLS; 
maximum_scale » 2048;, 
break; 

case BLACK: 

totals = bl_storage + 1; 
num_of_symbols = BW^SYMBOLS; * * 

maximum_scale = 2048; 
break* 
case MR ^CONTROL: 

num_of_symbols = MR_SYMBOLS; 

totals = mr_storage!(prevl * num_of_symbois + prev)] + 1; 

maximum_scale = 256; 

break; 

} 

s->scale = totals! num_of_symbols J; 

} 

/* 

* During decompression, we have to search through the table until 

* we find the symbol that straddles the "count" parameter. When 

* it is found, it is returned. The reason for also setting the 
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* high count and low c«t is so that symbol csn be properly ggoved 

* from the arithmetic > n P ut - ^ m 



int convert_svmboi_to_int( int count, symbol *s) 



{ 

! nt c; 



for ( c = num_of_svmbols-1; count < totals! c I ; c- ) 



s->high count = totalsl c + 1 1; 
s->low_count = totalsl cl; 
return( c ); 

} 

/* The following is an optional module, that initializes the statistical 
es?fmat£n tables witK pre-defined values, it can slightly improve 
compression of small files */ 

init_mr_modelo 
{ 

int i; 

update_initial_mr_model(VO, 6); 
updateJnitial_mr_modei(VLl, 2); 
update initial_mr_modei(VR1,2); 
updateJnitiai_mr_model( HOR, 2); 
update_initiai_mr_model( PASS, 1); 

} 

updateJnitial_mr_model( int symbol, Int count ) 
* inti, prev. previ, J; 

num_of .symbols = mr_symbols,- 
maximum_scale = 256; 

for (prev - 0; prev< num_of_symbols; prev + +) 
for (previ - 0; prevl < num_of_symbols : previ + +) { 

• totals - mr storagel(prev1 * num_of_symbols + prev)] + 1; 

for 0=0; j< count; j + 
update_model(svmbol); 

} 

} 

free_amdl_bufso 

* int i, order_2_symbols; 

num of symbols = MR SYMBOLS; 

Srder 2 symbols = num_of_symbols * num_of_symbols; 

for (i=0; i<order_2_symbols; i+ +) 

free(mr_storagelil); 
free(mr_storage); 

num_of_symbois - BW_SYMBOLS; 
free(wt storage); 
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free(bl_storage); 
//_heapminO; 

} 
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/• 

* Listing 8 - amodel. 

* 

* This file contains all of the function prototypes and 

* external variable declarations needed to interface with 
- the modeling code found in andlc. 



/* 

* Eternal variable declarations. 
V 

extern int max_order; 
extern int f lushing_enabled; 
/* 

* prototypes for routines that can be called from MODEL-X.C 
*/ 

void initialize jnodeK void ); 
void updatejnodeK int symbol ); 

int convertJnt_to_symbol( int symbol, SYMBOL *s, int mode ); 
void get_symbol_scale( SYMBOL *s, int mode, int prev, int prevl ); 
int convert_symboLtoJnt( int count, SYMBOL *s ); 
void add_character_to_modei( int c ); 
void f lushjnodeK void ); 
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/* 

* Listing 2 - coder.c 

* SOURCE: Dr. Dobbs Journal, Feb 1991 + minor modifications by 

Arik cordon 

* 

his file contains the code needed to accomplish arithmetic 

* coding of a symbol. All the routines in this module need 

* to Know in order to accomplish coding is what the probabilities 

* and scales of the symbol counts are. This information is 

* generally passed in a SYMBOL structure. 

* This code was first published by lan H. witten, Radford M^Neal 

* and John c. Cleary in "Communications of the ACM in June 1987, 

* and has been modified slightly. 
*/ 

^include <stdio.h> 
^include "acoder.h" 
#lnclude "abitio.h" 
#include "AGCMP.H" 



These four variables define the current state of the arithmetic 
coder/decoder. They are assumed to be 16 bits long. Note that 
by declaring them as short ints, they will actually be 16 bits 
on most 80X86 and 680X0 machines, as weli as VAXen. 

/ 



static unsigned short int code; /* The present input code value / 

static unsigned short int low; /* start of the current code range / 

static unsigned short int high; /• End of the current code range / 

long underflow_bits; /* Number of underflow bits pending / 

/* 

* This routine must be called to initialize the encoding process. 

* The high register is initialized to all 1s, and it is assumed that 

* it has an infinite string of 1s to be shifted into the lower bit 

* positions when needed. 

*/ ' ■ . 

void initialize_arithmetic_encodero 

* low = 0; 
high = Oxffff; 
underflow_bits = 0; 



This routine is called to encode a symbol. The symbol is passed 
in the SYMBOL structure as a low count, a high count, and a range, 
instead of the more conventional probability ranges. The encoding 
process takes two steps. First, the values of high and low are 
updated to take into account the range restriction created by the 
new symbol. Then, as many bits as possible are shifted out to 
the output stream. Finally, high and low are stable again and 
the routine returns. 
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*/ 




void Jastcall encode_symbol( symbol *s ) 
{ 

long range; 

* These three lines rescale high and low for the new symbol. 
*/ 

range =» (long) (high-low) + 1; 
high = low + (unsigned short int) 

(( range * s-> high_count ) / s->scale - 1 ); 
low = low + (unsigned short int ) 

(( range * s->low_count) /s->scale); 

/* 

* This loop turns out new bits until high and low are far enough 

* apart to have stabilized. 
*/ 

for ( ; ; ) 
{ 

/* 

* if this test passes, it means that the MSDigits match, and can 

* be sent to the output stream. 
*/ 

if ( ( high & 0x8000 ) = = ( low & 0x8000 ) ) 

output_bit(high & 0x8000 ); 
while ( underf low_bits > 0 ) 

output_bit(~high & 0x8000); 
underflow_bits-; 

} 

} 

/* 

* if this test passes, the numbers are in danger of underflow, because 

* the MSDigits don't match, and the 2nd digits are Just one apart. 
*/ 

else If ( ( low & 0x4000 ) && l( high & 0x4000 )) 

* underf low_bits + = 1; 
low &= 0x3fff; 

high | = 0x4000; ' 

} 

else 
return; 
low < < = 1; 
high < < - 1; 
high |=1; 

} 



* At the end of the encoding process, there are still significant 

* bits left in the high and low registers. We output two bits, 

* plus as many underflow bits as are necessary. 
*/ 

void f lush_arithmetic_encoder( ) 
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OUtpUt_bit(IOW & 0X4000 ); 
underf low_bits + +; 
while ( underflow bits- > 0 ) 
output bit(- low 8.0x4000); 



* When decoding, this routine is called to figure out which symbol 

* fe D-esentlv waiting to be decoded. This routine expects to get 

* the Srrent model scale in the s->scale parameter, and it returns 

* a count that corresponds to the present floating point code: 



* code = count /s-> scale 
*/ 

int get current_count( SYMBOL *s ) 

{ 

long range; 
short int count; 



range = (long) ( high - low) + 1; 
count = (short int) 

((((long) ( code - low ) + 1 ) * s->scale-1 ) / range ); 
return( count ); 

} 

' * This routine is called to initialize the state of the arithmetic 

* decoder. This involves initializing the high and low registers 

* to their conventional starting values, plus reading the first 

* 16 bits from the input stream into the code value. 
*/ 

void initialize_arithmetic_decoder( ) 
{ 

int i; 

code = 0; 

ford -0;1<16;I + +) 
{ 

code << = 1; 
code + = input_bitO; 

low = 0; 
high = Oxffff; 

} 

/* 

* just figuring out what the present symbol is doesn't remove 

* it from the input bit stream. After the character has been 

* decoded, this routine has to be called to remove it from the 

* input stream. 
*/ 

void remove_symbol_from_stream(SYMBOL *s ) 
{ 

long range; 
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' * First, the range is e ded to account for the symbol rer^M. 
V 



range = (long)( high • low ) + 1; 
,iigh = low + (unsigned short int) 

((range * s->high_count) /s-> scale - 1 ); 
iow = low + uinsigned short int) 

(( range * s->low_count)/s->scale); 

/* 

* Next, any possible bits are shipped out. 
*/ 

for ( ; ; ) 
{ 

/* 

* if the MSDigits match, the bits will be shifted out. 
*/ 

if ( ( high & 0x8000 ) = - ( low & 0x8000 ) ) 

{ 
} 

/* 

* Else, if underflow is threatining, shift out the 2nd MSDigit. 
*/ 

else if ((low & 0x4000) - = 0x4000 && (high & 0x4000) = - 0 ) 

* code*= 0x4000; 
low &= 0x3fff; 
high | = 0x4000; 

} 

/* 

* Otherwise, nothing can be shifted out, so I return. 
*/ 

else 
return; 
low < < = 1; 
high < < = 1; 
high | = 1; 
code<<=l; 
code + = input_bitO; 

} 

} 
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Listing 1 -acoder.h 

* This header file contains the constants, declarations, and 

* prototypes needed to use the arithmetic coding rout nes These 
- lecSions are for routines that need to interface w.th the 

arithmetic coding stuff in acoder.c 

V 

# define MAXIMUM scale 2048// 16383 /* Maximum allowed frequency count 
# define ESCAPE 256 /* The escape symbol / 
idSf nl DONE -1 /• The output stream empty symbol V 
Jdeflne flush -2 /* The symbol to flush the model •/ 



A symbol can either be represented as an Int, or as a pair of 
counts on a scale. This structure gives a standard way of 

* defining it as a pair of counts. 
*/ 

typedef struct { . 

unsigned short int low_count; 
unsigned short int higrrcount; 
unsigned short int scale; 

} SYMBOL; 

extern long underflow bits; /* The present underflow count in */ 
/* the arithmetic coder. */ 

/* 

* Function prototypes. 
*/ 

void initialize_arithmetic_decoderO; 

void remove_symbol_from_stream( SYMBOL *s ); 

void initialize_arithmetic_encoder( void ); 

void encode_symbol( SYMBOL *s); 

void fiush_arithmetic_encoderO; 

int get_current_count( SYMBOL *s); 
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/* 

* Listing 4 - abitio.c 



* SOURCE: Dr. Dobbs Journal, Feb 1991 + minor modifications by 

Ar»ic cordon 

mis routine contains a set of bit oriented i/o routines 

* used for arithmetic data compression. The important fact to 

* know about these is that the first bit is stored in the msb of 

* the first bvte of the output, like you might expect. 

+ 

* Both input and output maintain a local buffer so that they only 

* have to do block reads and writes. This is done in spite of the 

* fact that c standard I/O does the same thing. If these 

* routines are ever ported to assembly language the buffering 

* will come in handy. 

* 

*/ 

#include <stdio.h> 
^include <stdlib.h> 
^include "acoder.h" 
#include "abitio.h" 

include "AGCMP.H" 

#define BUFFER_SIZE 8192 

static char -buffer; /* This is the i/o buffer •/ 
static char *current_byte ; /* Pointer to current byte */ 

static int outputjnask; /* During output, this byte */ 

/* contains the mask that is */ 
/* applied to the output byte*/ 
/* if the output bit is a 1 */ 

static int input_bytes_left; /* During input, these three */ 
static int input_bits_left; /* variables keep track of my*/ 
static int past eof; /* input state. The past_eof */ 

/♦ byte comes about because */ 
/* of the fact that there is */ 
static long total_bytes; /* a possibility the decoder */ 

/* can legitimately ask for */ 
/* more bits even after the . */ 
/* entire file has been */ 
/* sucked dry. */ 

static file *stream; 



* This routine is called once to initialze the output bitstream. 

* All it has to do is set up the currentbyte pointer, clear out 

* all the bits in my current output byte, and set the output mask 

* so it will set the proper bit next time a bit is output. 
*/ 

void initialize_output_bitstream(char *fiie, void 'header, unsigned int header_size) 

* buffer = ma!IOC(BUFFER_SIZE + 2); 
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if (tuffer - ■- null) { 
printf("\niobs:no mem\n"); 

exit(9); 

total_bvtes = CL; 
current byte = buffer; 
- current joyte = 0; 
outputjnask = 0x80; : 
stream « fopentflle. "wb"); 

setvbuf( stream, NULL, jofbf, 8192 ); elwaml 
total bvtes + = fwritemeader, 1 , neadersize. stream), 
//printf("total_bvtes = %ld\n", total_bytes); 

} 

'** The output bit routine just has to set a bit in the current byte 

* if requested to. After that, it updates the mask, if the mask 

* shows that the current byte is filled up, it is time to go to the 

* next character in the buffer. If the next character is past the 

* end of the buffer, it is time to flush the buffer. 
*/ 

void output J3it(int bit ) 
{ 

if ( bit ) 

*current_byte | = outputjnask; 
outputjnask >>= 1; 
if ( outputjnask = = 0) 

* outputjnask = 0x80; 
current byte++; 

if ( current.byte - - ( buffer + BUFFER jsize ) ) 

{ total_bytes + = fwrite( buffer, 1, BUFFER_SIZE, stream ); 
current_byte = buffer; 

* current Jayte = 0; 

} 

} 

when the encoding is done, there will still be a lot of bits and 

* bytes sitting in the buffer waiting to be sent out. This routine 

* is called to clean things up at that point. 

long flush _output_bitstreamCvoId 'header, unsigned int neader_size) 

( total_bytes + = fwrite( buffer, 1, (size_t)( current Jayte - buffer ) + 1, stream ); 

current J)yte = buffer; 

fseek(stream, OL, SEEKJ5ET); 

memcpy (header, &total_bytes. sizeof(long)); 

fwrite(header, header_size, 1, stream); 

fclose(stream); 

free(buffer); 

_heapminO; 

return(total_bytes); 
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'* 

* Bit oriented input is set up so that the next time the input_bit 

* routine is called, it will trigger the read of a new block. That 
' is why ;nput_bits_!eft is set to 0. 

void initialize Jnput_bltstream(char 'file, void 'header, unsigned int neaderjlze) 

{ buffer - ma!lOC.(BUFFER_SIZE + 2); 

if (buffer = - null) { 
printf("\niibs:no mem\n"); 
exit(9); 

?nput_bits left - 0; 

input_bytes_left = 1; 

past_eof = 0; 

stream = fopen(file, "rb"); 

setvbuf( stream, null, jofbf, 8192 ); 

fread(header, 1, header_size, stream); 

} 

close_input_bitstreamo 
{ 

free(buffer); 

_heapminO; 

fclose(stream); 

} 



* This routine reads bits in from a file. The bits are all sitting 

* in a buffer, and this code pulls them out, one at a time. When the 

* buffer has been emptied, that triggers a new file read, and the 

* pointers are reset. This routine is set up to allow for two dummy 

* bytes to be read in after the end of file is reached. This is because 

* we have to keep feeding bits into the pipeline to be decoded so that 

* the old stuff that is 16 bits upstream can be pushed out. 
*/ 

int input_bito 

* if ( input_bits_left = = 0 ) 

* current byte+ +; 
lnput_bytes_left-; 
lnput_blts_left = 8; 
if ( inputbytesjeft = = 0) 

{ input_bytes_left = fread( buffer, 1, buffer_size, stream ); 
if ( input_bytes_left = = 0) 

* if(past_eof) 

* fprintf( stderr. "Bad input file\n" ); 
exit( -1 ); 

} 

else 
{ 
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1994 



past_eof = 1; 
input_bytesjeft = 2; 

} 

currentbyte = buffer; 

} 

te^SSnt.bvte » input.bitsjeft) »1 >; 

} 
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/* 

* Listing s - abitio.h 

* This header file contains tne function prototypes needed to use 

* the bitstream i/o routines. 



wi?lnl5aHS ; output bitstream(char -file, void 'header, unsigned int headerjize) 
long flush output bitstream(void -header, unsigned int header.s.ze); 

voiS ffiKmp^wkreamtehar -file, void -header, unsigned int header.size); 
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ACEXP DECOMPRESSI 



r ^^^fejt * * * * * * * * * * * ■> * * 4 it it A * * * * * * * * * * "V ****-*•■**■***•*■* * 

(J^FILITY ^0 



~ne agexp program decompresses files created by agcmp to 
oinary ratsterized file (no headers) on the disk. 

(File size is fixed and determined in agcmp.h) 



FILES: 



agexp.c - the main loop for decompression. Retrieves MR codes 
from the arithmetic coder and re-generates the raw 
binary file. 



The following sources are common to both programs - agcmp and 
agexp (Decompression) and handle the statistical estimation 
(element frequency accumulation) and the arithmetic coding: 

amdl.c - Statistical estimation. Based on a source from Dr. Dobbs 
journal, February 1 991, "Arithmatic coding and Statistical 
Modeling- by Mark R. Nelson, but modified to fit compression 
of MR codes. 

acoder.c, abitio.c - implement the arithmetic coder, based on Dr. Dobbs 
Journal. 

COMPILATION: 



agexp: cc agexp.c amdl.c acoder.c abitio.c 



FURTHER INFORMATION about agexp.C: 

AUTHOR: Arik Cordon 



INPUT: compressed file. 

OUPUT: A rastered file (No headers!) with 1728 binary pixels per line 

usage: agexp COMPRESSED_FILE_NAME RASTER_FILE_NAME 

Desc : This is the main loop for agexp utility. It makes calls to the 

arithmetic coder to retrieve the MR codes, and than builds a 

ratered binary image. 
***************************************************************************/ 



#include <stdio.h> 
#include <stdlib.h> 
include <string.h> 
^include <fcntl.h> 
^include <memory.h> 
#inciude <malloc.h> 
#include <sys\types.h> 
#inciude <sys\stat.h> 
// ^include <dos.h> 



Page 1 



C:™K\COMPRESS\PTNTSRC\ACEXP.C - Sun Aug 28 07:04:42 1994 



^include "accdsr.ir 
include "amodel.rr 
^include "abitio.h" 
^include "agcmp.ir 

^nd_next(int color, int pos, char *line, int len); 

uncompress_strip_and_save(unsigned char *compressed,long compressed_si2e,int fdo); 
rr,r_uncompress(unsigned char *line, unsigned char *prev); 
void huf_uncompress(unsigned char *line,unsignsd char *str); 
void getjDit_stream(char *str,char *bufi,long compressed_size); 
int pack8(unsigned char *line # unsigned char *buf); 
int find_b1(int aO_color,int aO,char *line,int length); 

void verticaLcodednt *aO_color,int *aO,char *prev,int length,char *curr,int offset); 
int find_huf_len<int *aO_color); 



#define strip_size 100 // can be any number, determines buffer size 



maindnt argc, char *argvU) 
{ 

if (argc! = 3) { 

fprintttstderr, "\nUsage: %s G3_putput_file_name iMGJilejiame \n H , argvlOl); 
exit(9); 

agexp(argv[H, argv[2»; 

} 

agexp(char *infile, char *outfile) 

* char line[PELS_PER_LlNE] # prevJine[PELS_PER_LlNE], *bufo; 
int fdo; 

unsigned char *compressed; 
long compressed_size; // in bits 
intj = 0, tine_num = 0; 
ac_header agjieader; 



if ((fdO = Open(OUtfile, 0_WRONLY | o_creat | ojrunc | o_binary f sjread I S_IWRITE)) < 1) 
BigErrO, "ACEXP: can't open outfile"); 

if ( (bufO = mallOC(STRIP_SIZE*BYTES_PERJJNE» = = NULL ) 

BigErrO, "AGEXP: no mem"); 

memset(prevjine, 0, PELS_PER_LINE); 

initializejnodelO; 
initjnrjnodelO; 

initializejnput_bitstream(infile f &ag_header, sizeof(ag_header)); 

initialize_arithmetic_decoderO; 

init_get_lO; 

printfrLiNES: %ld total %ld\n ,, # ag_header.number_of_llnesJn_file, agJieader.total_bytes}; 

while ( mr_uncompress(line, prevjine) ! = -1 ) { 
if ( line_num+ + % 100 = = 0) 
printfriine %d\r" f Iine_num-1); 
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memcpY(prev_line,^k PELS__PER_LINE); 
packSdine, bufo+jnWES_PERjJNE); 

if <j = = STRIP_SIZE) { 

writ9(fdO, bufO, BYTES PER LINE*STRIP_SIZB; 

j = 0; 

} 
} 

if (j ! = 0) 

writetfdO, bufO, BYTES_PER_LINE*j); 



free_amdl_bufsO; 
c!ose_get_lO; 
free(bufo); 
close(fdo); 

} 



//////////////////^ 

/** This loop decompresses one rasterized line ! **/ 
mrjjncompress(unsigned char *line, unsigned char *prev) 

{ int aO_color = WHITE, b1, b2; 
int aO * 0, MaOal, Mala2, code; 



linelOl = WHITE; // force a white pixel on line beginning 

while (aO < PELS_PER_LINE) {//while not EOL 
code - get_1(MR_CONTROL); 
if (code = = EOR 

returnM); 
switch (code) { 
case VO: 

vertical_code(&aO_color, &a0, prev, PELS_PER_LlNE, line, 0); 
break; 
case VR1 : 

vertical_code(&aOj:olor, &a0 f prev, pels_perj.ine, line, in- 
break; 
caseVLl: 

vertical_code(&aO_color, &a0, prev, pels_per_line, line, -1); 
break; 
case HOR: 

MaOal - find_hufjen(&a0_color); //gpos is gloablly known 
Mala2 = find_hufjen(&aO_color); 
memsetdine + aO, aO_color, MaOal); 
memsetdine +a0+ MaOal, !aO_color, Mala2); 
aO + « (MaOal + Mala2); 
break; 
case PASS: 

bl = find_b1(a0_color, aO, prev, PELS_PER_LINE);. 
b2 = findjiext(aO_color, bl +1, prev, PELS_PER_UNE); 
memsetdine +a0, a0_color, b2-a0); 
aO = b2; 
break; 
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case VR2: 

vertical_code<&aO_coior, &a0, prev, pels_per_line, line, 2); 

case VL2: 

yerticalj:ode(&aO_color, &a0, prev, pels per line line -2v 
break; ~ ' ' " 

case VR3: 

vertical_code(&aO_color f &a0, prev, pels_per_line, line, 3); 
or cdK,* 
case VL3: 

verticai_code(&aO_color, &a0, prev, pels_per_line, line. -3); 

oreaK; 

> 
} 

returnd); 



f ind_bi (int aO_color, int aO, char *ilne, int length) 
intbi; 

if (iinefaOJ = = aO_color) 

b1 = find_next«aO_color, aO+1, line, length); 
else { 

bl = find_next(aO_color, aO+1, line, length); 
^ bl = find_next«aO_color, bl +1, line, length); 

return(bl); 

} 

/* Builds partial rasterlzed line according to MR codes V 

void vertical_code(int *ao_color, int *a0, char -prev, int length, char *curr, int offset) 
intal,b1; , ; 

bl = find_bK*aO_color, *a0, prev, pels per line)- 
//printf("(bi=%d)\n", bD; " - 

al = bl + offset; 

HSSSSSS^£^ % c2i£% % -T™ - %a ° a - m,w - * a °- •*-«•»•• a1 " a °. «. •■* 

*aO_color = !(*aO_color); 
*a0 = al; 

} 

findjiufjendnt *aO_co!or) 
int len; 

len - bw_symbols - 1- get_K*aO_color); 

Ff (len > 63) 
len « (len - 63) * 64; 

if (len < 64) { 



Page 4 



C:\J^^COMPRESS\PTNTSRC\AGEXP.C - Sun fi^^8 07:04:42 1 994 



*a0_color = !*a0_color; 
returnden); 
} else 

returnden + find_hufjen(aO_color)); 

) 

find_next(int color, int pos, char *line, int !en) 
{ 

int i; 

char *ptr; 

if (pos > len-1) 
returnden); 

if ( (ptr = memchrdine + pos, color, len-pos)) = = NULL) 

return lea- 
else 

return (ptr-line); 

} 

BigErrdnt n, char *s) // too many bits in strip. 
{ 

printfrErr %d - %s", n, s); 
exit(9); 

} 



static int *count, prev, prevl; 

/* * * * Arithmetic decoder staff *****/ 

init_get_lO 

{ 

count = mal!oc(sizeof(int) * 3); //mr + b&w 
memset(count, 0, sizeof(int) * 3); 
prev = prevl = 0; 

} 

/* * * * Arithmetic decoder staff *♦***/ 

ciose_get_io 

{ 

free(count); 

close_input_bitstreamO; 

} 

/**♦** gets one symbol from the arithmetic coder V 

getjldnt mode) 

{ 

SYMBOL S; 
int C; 

get_symbol_scaie( &s, mode, prev, prevl ); 

counttmodel = get^current^countt &s ); 

c - convert_symbol_toJnt( countlmodel, &s ); 



if (mode = = MR_CONTROL) { 
prevl = prev; 
prev = c; 
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} 

removejsymbol Jrom _stream( &s ); 

if (Cl- EOF) 

update_mode!(c); 
return(c); 



} 



static pack_bytes, byte; 

/*** routines for packing bytes to bits (for output) ***/ 
pack8(unsigned char *line, unsigned char *buf) 

* int i=0, j, k, color, new_pos, pos=0, a bits, count =0; 

pack_bytes = 0; 
byte = 0; 
color = linelOl; 

while <(new_pos - find_next(!color, pos, line, PELS_PER_LINE)) ! - PELS_PER_LINE) { 
pack_n_bits(color, new_pos - pos, &count, buf); 
pos = new_pos; 
color = Icolor; 

pack_n_bits(color, pels_per_line - pos # &count, buf); 

} 

pack_n_bits(int color, int n, Int *count, char *buf) 
{ 

int bits; 

static b_tablell = {0,1,3,7,15,31,63,127,255}; 

while ((*count + n) > 8) { 
if Ccount != 0) { 

bits = 8 - *count; 

byte = (byte < < bits); 

if (color) 
byte + = ( (color < < bits) - 1 ); 

buftpack_bytes] = byte; 

pack_bytes + +; 

n-=bits; 

♦count - 0; 
} else { 

if (color) 
byte = 255; 

else 

byte = 0; 

buf[pack_bytesl = byte; 
pack_bytes + +; 
n— 8; 

} 
} 

byte « (byte < < n); 
if (color) 
byte + « b. tabie!ni; 
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(*count)+ =n; 



'f (*count - = 8) { 
b J jftpack_bytesl = byte; 
pack_bytss* k; 
*count - 0; 

} 
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